A semi-supervised short text sentiment classification method based on improved Bert model from unlabelled data

نویسندگان

چکیده

Abstract Short text information has considerable commercial value and immeasurable social value. Natural language processing short sentiment analysis technology can organize analyze on the Internet. tasks such as classification have achieved satisfactory performance under a supervised learning framework. However, traditional relies large-scale high-quality manual labels obtaining label data costs lot. Therefore, strong dependence hinders application of deep model to large extent, which is bottleneck learning. At same time, datasets product reviews an imbalance in distribution samples. To solve above problems, this paper proposes method predict according semi-supervised mode implements MixMatchNL enhancement method. Meanwhile, Bert pre-training updated. The cross-entropy loss function improved Focal Loss alleviate datasets. Experimental results based public indicate proposed accuracy recognition compared with previous update other state-of-the-art models.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Stacking for Semi-supervised Sentiment Classification

In this paper, we address semi-supervised sentiment learning via semi-stacking, which integrates two or more semi-supervised learning algorithms from an ensemble learning perspective. Specifically, we apply metalearning to predict the unlabeled data given the outputs from the member algorithms and propose N-fold cross validation to guarantee a suitable size of the data for training the meta-cla...

متن کامل

LCCT: A Semi-supervised Model for Sentiment Classification

Analyzing public opinions towards products, services and social events is an important but challenging task. An accurate sentiment analyzer should take both lexicon-level information and corpus-level information into account. It also needs to exploit the domainspecific knowledge and utilize the common knowledge shared across domains. In addition, we want the algorithm being able to deal with mi...

متن کامل

Short Text Classification Based on Improved ITC

The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conventional TFIDF and the superiority of the ITC compared with the TFIDF, then we conclude the flaws of the conventional ITC algorithm, and then we present an improved ITC feature selec...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Semi-supervised Collaborative Text Classification

Most text categorization methods require text content of documents that is often difficult to obtain. We consider “Collaborative Text Categorization”, where each document is represented by the feedback from a large number of users. Our study focuses on the semisupervised case in which one key challenge is that a significant number of users have not rated any labeled document. To address this pr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Big Data

سال: 2023

ISSN: ['2196-1115']

DOI: https://doi.org/10.1186/s40537-023-00710-x